New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC 0171] Default name of fetchFromGithub FOD to include revision #171
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: Robert Schütz <github@dotlambda.de>
17a0a70
to
7534dd4
Compare
Just one silly question: why only
|
I did reference this rfcs/rfcs/0171-fetch-from-github.md Line 104 in 1ac472d
I mention fetchFromGitHub because it's about 20x as common:
I'm not opposed to doing a similar treatment to the other fetchFromX helpers. I think this RFC could be expanded to encompass them all eventually. To your point, maybe this should be relabeled as "Default name of fetchFromX FOD helpers to include revision" |
There's already a speculative impl as of 2 weeks ago: NixOS/nixpkgs#294068 Though the format is slightly different ( |
I choose the format to make it easier to inspect which source FOD is going wrong when several packages got rebuild at the same time. Update: Just came across NixOS/nixpkgs#49862, which seems to work on the package |
rfcs/0171-fetch-from-github.md
Outdated
- "Interchangeability" with other fetchers is diminished as the derivation name is different | ||
- In practice, fetchFromGitHub is never used in this way. It is generally the only fetcher, so there is never another FOD to dedupilicate. | ||
- Out-of-tree repositories may get hash mismatch errors | ||
- If the cause of the mismatch is staleness, this is good and working as intended | ||
- If the cause is non-determinism, this is frustrating. | ||
- Some derivations assume "source" to be the name of sourceRoot | ||
- This has been mitigated over two years within Nixpkgs | ||
- Out-of-tree code may break if they assume "source" is the name | ||
- Can be mitigated with release notes describing the issue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are caused by the assumption of src.name == "source"
being broken.
However, we have already deprecated such assumption as early as Nixpkgs 21.11. Nixpkgs Manual now requires sourceRoot
to start with "${src.name}"
instead of "source"
when src
is constructed with by fetchgit
-based fetchers, and tree-wide conversions (NixOS/nixpkgs#245388, NixOS/nixpkgs#247977, NixOS/nixpkgs#248528, NixOS/nixpkgs#248683, NixOS/nixpkgs#294334) have been merged.
fetchFromGitHub
and other output-as-a-directory fetchers can still be used interchangeably if we stick to ${src.name}
instead of "source"
.
In my opinion, we do not provide bug-level compatibility to packages failing to follow already-stablized specifications, in-tree or out-of-tree.
IMO, "fetchers for version-controlled repositories" would be a suitable target. This includes fetchers based on specific version control systems (e.g. |
Hehe per #133 (comment), my dream of single hash git fetchers (avoiding the need to put things in the name like this) is now a good bit closer! |
Congratulations! We still need to fix those fetchers at Nixpkgs level, though, since Nixpkgs often takes years to adopt new Nix features. |
- Full commit hashes could be truncated. This sacrifices a bit of simplicity for better looking derivation names: | ||
``` | ||
let | ||
version = builtins.replaceStrings [ "refs/tags/" ] [ "" ] rev; | ||
# Git commit hashes are 40 characters long, assume that very long versions are version-control labels | ||
ref = if (builtins.stringLength rev) > 15 then builtins.substring 0 8 version else version; | ||
in lib.sanitizeDerivationName "${repo}-${ref}-src"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't recommend this. Short hashes are not guaranteed to be stable for long term storage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
40 characters are not really that long for a machine-generated derivation.
People would most likely see the store hash and (hopefully) part of the name
when it flashes through their terminal. During debugging, a path like that would occupy at most a bit more than one line inside the terminal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current "source" isn't that stable either. 8 characters is still a fair amount of entropy, and likely to be different enough for most repositories.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also remember, that it doesn't have to be "unique across all time". It just needs to be different than what was there before, so that the combination of name + hash are different.
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rfcsc-meeting-2024-04-02/42643/1 |
I'd like to nominate myself as a shepherd. I previously authored a related implementation, NixOS/nixpkgs#294068, which could inform the discussion around this RFC's design. I'm excited about this RFC and look forward to working with the community to shepherd it through the process. |
Co-authored-by: Yueh-Shun Li <shamrocklee@posteo.net>
If we have a shepards meeting, we can refine the scope. |
Yes, NixOS/nixpkgs#49862 is very related, in its latest reincarnation it allows you to keep both |
To code the fetcher name into the |
# Detailed design | ||
[design]: #detailed-design | ||
|
||
Change the default name of fetchFromGithub fetcher from `"source"` to `lib.sanitizeDerivationName "${repo}-${rev}-src"`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have two suggestions:
- How about using
pname
instead of repo as a sort of middle ground between stability and informative names. Of course then changing the repo url will not automaticaly verify if the hash actually matches, so a tradeoff. - Also how about starting with
src-
instead of at the end? When listing the nix store, all fetched inputs can then be sorted together.
The second suggestion is actually a part of a wish I have to prepend src-
to all build inputs (tars, patch files, builders, etc), pkg-
to all packages, and nixos-
to the generated nixos configs, so that it is easier to tell at a glance why a particular store path is being installed, and also would be useful for tools like nvd when listing diffs. Yeah this would a much broader proposal, but one can dream someday it will happen. For now, we can start with just the FOD fetchers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using pname instead of repo as a sort of middle ground between stability and informative names. Of course then changing the repo url will not automaticaly verify if the hash actually matches, so a tradeoff.
pname
is a stdenv.mkDerivation convention, this would require people to pass it separately. The goal here also is to avoid hash mismatch, so I don't think is this is enough to solve the problem this RFC is trying to remedy.
Also how about starting with src- instead of at the end? When listing the nix store, all fetched inputs can then be sorted together.
I wouldn't be opposed to it, but I'm used to <pname>-<version>-<variant>
naming, source being a variant in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second suggestion is actually a part of a wish I have to prepend src- to all build inputs (tars, patch files, builders, etc), pkg- to all packages, and nixos- to the generated nixos configs, so that it is easier to tell at a glance why a particular store path is being installed, and also would be useful for tools like nvd when listing diffs. Yeah this would a much broader proposal, but one can dream someday it will happen. For now, we can start with just the FOD fetchers.
It sounds like another Nix RFC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm used to
<pname>-<version>-<variant>
naming, source being a variant in this case.
It's a bit strange to consider package source a "variant" of the package. Nevertheless, such name-suffixing pattern exists in Nixpkgs -- the vendor FOD assign to goModules
attribute of buildGoModule
has its name
set as "${pname}-${version}-go-modules"
, and sometimes people manually set the name of in-attribute FODs this way.
On the other hand, language-specific packages (e.g. Python packages, Ocaml packages, etc.) often prefix the name with the name and version of the compiler/interpreter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not specifically tied to the suffix approach.. I'll add a comment to the initial comment
Co-authored-by: Pieter Gerets <123326988+gerpiet@users.noreply.github.com>
People who use the result FOD produced by As an enforcible specification, maybe we don't even need to specify what (Whether/how long the revision could be truncated are still in this scope, since it affects the probabity of colisions.) |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
This RFC has not acquired enough shepherds. This typically shows lack of interest from the community. In order to progress a full shepherd team is required. Consider trying to raise interest by posting in Discourse, talking in Matrix or reaching out to people that you know. If not enough shepherds can be found in the next month we will close this RFC until we can find enough interested participants. The PR can be reopened at any time if more shepherd nominations are made. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rfcsc-meeting-2024-04-16/43512/1 |
I'll toss my name in for an RFC shepherd, as I believe in the utility of this proposal. |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: |
@oxij I would like to nominate you as a shepherd, looks like you've thought about this longer than I've been involved in nixpkgs. I like your addition of a lib function to make derivation names more consistent in NixOS/nixpkgs#49862 |
Thanks for a nomination, but I don't think this RFC is a proper solution to the stated problem.
Eelco and others objecting to the `<hash>-<name>-<version or revision>-source` naming scheme (and to the original 2018 version of #49862 which wanted to do exactly this, but for all fetchers, not just `fetchFromGitHub`, and, primarily, for a different reason of `/nix/store` discoverability) do have a point: the current most common `<hash>-source` naming scheme is awfully convenient for running a build server (like Hydra) and when you have to frequently switch between fetchers (e.g. when using out-of-Nixpkgs derivations and/or flakes).
IMHO, the problem this RFC tries to address is actually a set of features/bugs/issues of Nix itself:
- Nix reuses existing fixed-output outputs between different derivations without actually checking the derivation actually builds into an output with the given output hash; which is certainly a feature, given how useful it is for switching between different fetchers;
- but, IIRC, last time I tried, explicit `nix-store -r --check <derivation>` over such a derivation will fail to re-build and re-check it; which is either a bug, or there should be a separate option for doing this; for simplicity, let's assume we want a new option for this, and we want to call it `--force-rebuild`;
- `nix-build` and `nix build` lack `--force-rebuild` option (similarly to how they lack `--check` option too), you have to call `nix-store -r` manually (an issue).
(Now that I think about it, I should probably open Nix issues for these. Later, I'm a bit busy ATM.)
If the latter two issues were not such, then you could just do `nix-build --force-rebuild -A hello.src` instead of any of this, and OfBorg could also run that on all changed fixed-output derivations in a PR, ensuring the author did not screw it up regardless of any (non-)changes to derivation names.
Which would be a better solution to the stated problem, since this RFC will fail to solve it for derivations with custom `name`s anyway.
In the meantime, IMHO, since changes like #245388, #247977, #248528, #248683 are generally useful and get merged without objections, I think you should apply #49862 to your Nixpkgs, split yet-unfixed changes to `sourceRoot`s from your #153386 into a separate patchset (I think I covered most of your changes with my PRs, but I probably missed at least some of those changes), rebuild your system with #49862 applied and `nameSourcesPrettily = true` or `nameSourcesPrettily = full` set, and PR any changes you need to make it all work.
Posting a nice review in #49862 so that it would get a chance to actually get merged eventually would be nice too, I suppose.
Then, if nobody ever wants to work on the Nix issues described above, and #49862 gets merged, we could discuss setting `nameSourcesPrettily = "full"` for OfBorg as a workaround.
And, IMHO, it should be `"full"`, not `true` (which is essentially what this RFC proposes) so that changing between `fetchgit`, `fetchFromGitHub`, `fetchFromGitLab`, etc would force a recheck too.
|
I certainly think this conversation needs to be revived because of the security implications of
which leaves us open to a potential cache poisoning attack on |
@risicle You are correct, it does have security implications, but even doing the equivalent of `nameSourcesPrettily = "full"` of NixOS/nixpkgs#49862 wouldn't really help in the most general case: @tobiasBora in NixOS/ofborg#68 (comment) shows how to easily poison OfBorg's cache.
Obviously, it won't work for poisoning Hydra, since the evil PR of his method won't ever get merged.
But now thinking about this, I have an IMHO much simpler and thus scarier attack that _will_ work against Hydra:
- Malice makes an evil PR with a package update that changes the source's revision to a new version but points the hash to a hash of an older version of the same package, a version which has a well-known CVE.
- A very paranoid Nixpkgs maintainer Alice (who does not use any build caches) looks at it, it looks fine, so she builds it, and it works fine, since Alice has the older version of the source in `/nix/store`, and so she merges it.
- All tools that track unfixed CVEs in Nixpkgs only look at package versions, so those will be fooled too.
- Malice can now exploit the CVE.
In fact, if that source derivation also has a custom `name` (which is not uncommon), then _nothing_ will catch this (including me, with all my machines not using the Hydra cache and all using the equivalent of `nameSourcesPrettily = "full"` since 2018).
A scary thought.
So, I guess NixOS/nix#969 is actually a critical bug, then.
|
You're close to the scenario detailed by that author to the security team. Ultimately we can never completely stop a malicious change missing the attention of a reviewer (and this goes for all projects, hashed caching scheme or not), but we can make it so the submitter has to jump through some more hoops that will hopefully make the PR look weirder and provoke more attention. Hence small ("small") changes like this. |
The objection(NixOS/nixpkgs#49862 (comment) and NixOS/nixpkgs#59858 (comment)) is no longer relevant, as |
I still think these are two different points:
- Changing most source derivation names no longer breaks most builds, yes.
(Some do depend on customly named source derivations still, AFAIK. E.g. Apache Arrow, Datafusion, etc.)
(Also, as the author of both ccbb065c88080966e14872ea9e0ac577f753d902 and 775f21b9fdc1f2bb46518a575339fac8ff867959 in Nixpkgs I feel it kind of a conflict of interest to push for this for those reasons.)
- But moving away from `*-source` naming scheme will still make using flakes inconvenient.
|
Custom
Good point! That would be an unfortunate cache miss. Nevertheless, isn't it "the Nixy way" to pursue dependency isolation even at the cost of rebuilds (i.e. changing the version of compiler used to compile Bash triggers a world rebuild)? |
I wrote a long post with pros/cons of different possible solutions in NixOS/nix#969 (comment)
So, to
That would be an unfortunate cache miss. Nevertheless, isn't it "the Nixy way" to pursue dependency isolation even at the cost of rebuilds (i.e. changing the version of compiler used to compile Bash triggers a world rebuild)?
my answer is that, personally, I would like my build machines to always check all new fixed-output things coming from Nixpkgs, but never check those made by me (unless I ask for it explicitly).
The linked comment describes one way to implement that.
|
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/rfcsc-meeting-2024-05-14/45414/1 |
Make fetchFromGithub FOD name more meaningful. This avoids stale artifacts and gives more content fidelity when looking at nix store paths.
Rendered: https://github.com/jonringer/rfcs/blob/jringer/fetch-from-github/rfcs/0171-fetch-from-github.md
Items for further refinement:
src
label